Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 24, 2019

Make the delimiting chunker a common facility used by CSV and JSON.

@pitrou
Copy link
Member Author

pitrou commented Oct 24, 2019

@bkietz

@github-actions
Copy link

@bkietz bkietz self-requested a review October 24, 2019 15:18
@pitrou
Copy link
Member Author

pitrou commented Oct 24, 2019

@pitrou pitrou force-pushed the ARROW-6825-delimited branch 2 times, most recently from f487461 to c2804d0 Compare October 24, 2019 15:43
Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very elegant. Just a few comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is empty enough to fold into json/reader.cc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunker.h is small, but chunker.cc is non-trivial IMHO.

@fsaintjacques fsaintjacques self-requested a review October 24, 2019 16:07
Make the delimiting chunker a common facility used by CSV and JSON.
@pitrou pitrou force-pushed the ARROW-6825-delimited branch from c2804d0 to 68a5a02 Compare November 4, 2019 13:36
@pitrou
Copy link
Member Author

pitrou commented Nov 4, 2019

@bkietz I think I've addressed all your comments.
@fsaintjacques Are you still willing to review?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bkietz bkietz closed this in 21ca13a Nov 4, 2019
@pitrou pitrou deleted the ARROW-6825-delimited branch November 4, 2019 17:10
// Two blocks
auto csv1 = MakeCSVData({"ab,cd\n"});
auto csv2 = MakeCSVData({"ef,"});
AssertParseFinal(parser, {csv1, csv2});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou I'm getting 'ambiguous function call' compilation errors with this and AssertParseOk(line 241). Not sure why this was not caught in CI builds. I am building with tests, gandiva, jni ON. Could you please take a look and let me know if I should set any compiler flags? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post the full compiler output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:241:5: error: call to 'AssertParseOk' is ambiguous

AssertParseOk(parser, {csv1, csv2});

^~~~~~~~~~~~~

/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:138:6: note: candidate function

void AssertParseOk(BlockParser& parser, const std::string& str) {

 ^

/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:144:6: note: candidate function

void AssertParseOk(BlockParser& parser, const std::vectorutil::string_view& data) {

 ^

/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:393:3: error: call to 'AssertParseFinal' is ambiguous

AssertParseFinal(parser, {csv1, csv2});

^~~~~~~~~~~~~~~~

/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:150:6: note: candidate function

void AssertParseFinal(BlockParser& parser, const std::string& str) {

 ^

/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:156:6: note: candidate function

void AssertParseFinal(BlockParser& parser, const std::vectorutil::string_view& data) {

 ^

`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try the following patch?

diff --git a/cpp/src/arrow/csv/parser_test.cc b/cpp/src/arrow/csv/parser_test.cc
index f988f3ce2..418340f54 100644
--- a/cpp/src/arrow/csv/parser_test.cc
+++ b/cpp/src/arrow/csv/parser_test.cc
@@ -135,16 +135,25 @@ Status ParseFinal(BlockParser& parser, const std::string& str, uint32_t* out_siz
   return parser.ParseFinal(util::string_view(str), out_size);
 }
 
+std::vector<util::string_view> ViewsFromStrings(const std::vector<std::string>& data) {
+  std::vector<util::string_view> views(data.size());
+  for (size_t i = 0; i < data.size(); ++i) {
+    views[i] = data[i];
+  }
+  return views;
+}
+
 void AssertParseOk(BlockParser& parser, const std::string& str) {
   uint32_t parsed_size = static_cast<uint32_t>(-1);
   ASSERT_OK(Parse(parser, str, &parsed_size));
   ASSERT_EQ(parsed_size, str.size());
 }
 
-void AssertParseOk(BlockParser& parser, const std::vector<util::string_view>& data) {
+void AssertParseOk(BlockParser& parser, const std::vector<std::string>& data) {
   uint32_t parsed_size = static_cast<uint32_t>(-1);
-  ASSERT_OK(parser.Parse(data, &parsed_size));
-  ASSERT_EQ(parsed_size, TotalViewLength(data));
+  auto views = ViewsFromStrings(data);
+  ASSERT_OK(parser.Parse(views, &parsed_size));
+  ASSERT_EQ(parsed_size, TotalViewLength(views));
 }
 
 void AssertParseFinal(BlockParser& parser, const std::string& str) {
@@ -153,10 +162,11 @@ void AssertParseFinal(BlockParser& parser, const std::string& str) {
   ASSERT_EQ(parsed_size, str.size());
 }
 
-void AssertParseFinal(BlockParser& parser, const std::vector<util::string_view>& data) {
+void AssertParseFinal(BlockParser& parser, const std::vector<std::string>& data) {
   uint32_t parsed_size = static_cast<uint32_t>(-1);
-  ASSERT_OK(parser.ParseFinal(data, &parsed_size));
-  ASSERT_EQ(parsed_size, TotalViewLength(data));
+  auto views = ViewsFromStrings(data);
+  ASSERT_OK(parser.ParseFinal(views, &parsed_size));
+  ASSERT_EQ(parsed_size, TotalViewLength(views));
 }
 
 void AssertParsePartial(BlockParser& parser, const std::string& str,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same error with the patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enclosing elements of initializer list in braces seems to work. I pushed a patch - https://github.com/apache/arrow/pull/5791/files . Please merge if it looks alright. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants